Using Randomized Quantile Residuals to Assess Model Fit of Generalized Linear Mixed Models



Julia Piaskowski

May 14, 2025

https://some-link-to-talk.html

Linear Mixed Model

\[\mathbf{Y = Xb + Za + \epsilon} \]

\[\epsilon \sim N(0, \sigma^2 \mathbf{I_n}) \] \[a \sim N(0, \sigma_a^2) \]

\[ y_i|z_i \sim𝑁(x_i \beta, \sigma^2/r_i) \]

Standard Residual Plots

Standard Residuals

Raw \[\epsilon_i = Y_i - \hat{Y_i} \]

(internally) Studentized:

\[\epsilon_i = \frac {Y_i - \hat{Y_i}} {\hat{s}}\]

Pearson/Scaled:

\[ \epsilon_i = \frac {Y_i - \hat{Y_i}} {sd(Y_i)} \]

GLMM Formulation (binomial example)

Dependent variable: \(Y_{ij}\) (survived)

survival: \(p_i = Y_i/N\)

\[Y_{ij}|r_j \sim Binomial(N, \pi_{ij})\]

\[\eta_{ij} = \eta + \alpha_i + r_j\] (\(\eta\) = mean, \(\alpha_i\) = treatment, \(r_j\) = grouping variable)

\[\eta_{ij} = log(\frac{\pi_{ij}} {1 - \pi_{ij}} )\]

Raw Residuals from GLMMs Can Be Nonsensical

Raw/Scaled Residuals Can Lack Diagnostic Capabilities for GLMMs

  • Visual patterns are difficult to interpret
  • Over and under-dispersion are difficult to assess
  • Goodness of fit tests are not valid

Different distributional assumptions and mathematical mean/variance relationships of some distributions require a different approach

The Original Quantile Residuals

Background

For \(y_1,...,y_n\) responses:

\[y_i \sim \mathcal{P}(\mu_i, \phi)\]

CDF: \[F(y; \mu, \phi)\]

For a continuous \(F\), \(F(y_i; \mu_i \phi)\) are uniformly distributed, and the quantile residuals are:

\[ r_{i} = \Phi^{-1} \left\{F(y_i; \hat{\mu_i}, \hat{\phi} )\right\}\]

Example: Survival Times of Patients

Survival: \(y_i \sim Exp(\mu_i)\)

\[ \text{log }\mu_i = \beta_0 + \beta_1 \text{ log } x_i \]

Quantile residuals:

\[ r_{i} = \Phi^{-1} \left\{ 1-exp(y_i/\hat{\mu_i}) \right\}\]

Discontinuous CDF, F

\[r_i = \Phi^{-1}(u_i)\] \[ u_i \sim Uniform(a_i, b_j]\] \[a_i = lim_{y \uparrow y_i} F(y; \hat{\mu_i}, \hat{\phi})\]

\[ b_i = F(y_i; \hat{\mu_i}, \hat{\phi}) \]

Binomial Example

\[y_i \sim binomial(n, p_i) \]

\[ \text{logit} (p_i) = \beta_0 + \beta_1x_i \] \[r_i = \Phi^{-1}(u_i)\]

\[ u_i \sim Uniform(a_i, b_j] \]

\[a_i = lim_{y \uparrow y_i} F(y; n, \hat{p_i}) =\displaystyle\sum_{i=0}^{\lfloor k-1 \rfloor} \begin{pmatrix} n \\ i \end{pmatrix} p_i(1-p)^{n-i}\]

\[b_i = F(y_i; n, \hat{p_i}) =\displaystyle\sum_{i=0}^{\lfloor k \rfloor} \begin{pmatrix} n \\ i \end{pmatrix} p_i(1-p)^{n-i}\]

Method implemented in ‘statmod’

Your Image

This Method Has Been Around

Your Image

Randomization is used to produce continuously distributed residuals when the response is discrete or has a discrete component. This means that the quantile residuals will vary from one realization to another for a given data set and fitted model.

–Smythe & Dunn (1996)

DHARMa

(Diagnostics for HierArchical Regression Models)

The DHARMa Process

  1. Model something using a generalized linear model with a chosen distribution and link function.

\[\mathbf{Y = X\beta+Za}\] \[ \mathbf{Y|a\sim G(\mu, R)} \] linear predictor: \(\mathbf{\eta = X\beta}\)

fitted value: \(E(Y) = \mu\)

Link function: \(\eta = g(\mu)\)

The DHARMa Process

For each observation in the data set:

  1. Simulate new observations using fitted value as the mean and model parameters (e.g. shape, scale) for the other distributional parameters.

\[ Y_i \sim G(\hat{\mu_i}, \hat{\phi}) \]

  1. Calculate the quantile for the cumulative distribution function

\[ r_{i} = F(y_i; \hat{\mu_i}, \hat{\phi} )\]

Expectations of the Quantile Residuals

\[ r_{ij} \sim Uniform(0,1) \]

$ r_{ij} = 0\(: everything is larger\) r_{ij} = 1\(: everything is smaller\) r_{ij} = 0.5$: right in the middle

DHARMa runs 250 simulations (per observation) by default, they recommend up to 1000

Poisson Example

\[ (\hat{Y_i} = \lambda = 5,; \quad Y_i = 7 )\]

  1. For a sample size of 150 per group, cohen’s D = 0.5 and alpha = 0.05, the power is 0.99 ## Quantile Residuals for 1 Observation

Poisson GLMM Example

Create a data set with a random effect (“group”, 10 levels), a covariate (“Enviroinment1”) and a Poisson-distributed response variable (“observedResponse”):

testData = createData(sampleSize = 500)

Analyze correctly and incorrectly:

rightModel <- glmer(observedResponse ~ Environment1 + (1|group) , 
                     family = "poisson", data = testData)

wrongModel <- lmer(observedResponse ~ Environment1 + (1|group) , 
                     data = testData)

‘Standard’ Residuals

Quantile Residuals

Correctly Specified Model

Quantile Residuals

Incorrectly Specified Model

Distributions of Quantile Residuals

Quantile Function of the Standard Normal Distribution

Notes on DHARMa Implementation

  • It depends on the simulation functions build into GLMM package (supported packages: lme4, glmmTMB, …)
  • Commonly, only the last stochastic level (e.g. Poisson) is simulated, conditional on the fitted random effects - basically an omnibus test
  • Residuals can be simulated for individuals predictors and tests can be conducted for individual factors
  • for any kind of autocorrelation, conditional simulations are a must so that plots don’t indicate patterns or the tests fail due to autocorrelation

Conditional and Marginal Models

In Progress

(focus on lme4::glmer() and glmmTMB)

Other Functionality

(Implemented in DHARMa)

  • Uniformity test
  • Quantile location tesy
  • Outlier test
  • Tests for spatial and temporal autocorrelation
  • Test for zero-inflation
  • Over/Under dispersion tests

Best Practices

  • Use quantile residuals because they are a better diagnostic feature for checking model fit
  • Many of the tests are quite useful (but, as always, don’t rely on absolute threshold)
  • Outlier test results should be treated with caution
  • RTFM: read the DHARMa vignette; it is very helpful!
  • ….More

Sources

Bates DM, Maechler M, Bolker BM and S Walker (2015). “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A, Skaug HJ, Maechler M and BM Bolker (2017). “glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling.” The R Journal, 9(2), 378-400. doi: 10.32614/RJ-2017-066.

Dunn KP, and GK Smyth (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 1-10.

Feng et al (2017). Randomized quantile residuals: an omnibus model diagnostic tool with unified reference distribution. arXiv https://doi.org/10.48550/arXiv.1708.08527

Sources

Gelman A and J Hill (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York.

Hartig F (2022). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models. R package version 0.4.6, https://CRAN.R-project.org/package=DHARMa.

Pinheiro JC and DM Bates (2000). Mixed-Effects Models in S and S-PLUS. Springer Verlag, New York.

Stroup WW, Ptukhina M and J Garai (2024). Generalize Linear Mixed Models: Modern Concepts, Methods and Applications. CRC Press, Boca Raton.


(G)LMMs are hard - harder than you may think based on what you may have learned in your second statistics class….

–Ben Bolker, GLMM FAQ